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I. INTRODUCTION 
A. Setting the stage 

The Marcenko-Pastur 1967 paper [ij on the spectrum of empirical correlation matrices is both remarkable and 
precocious. It turned out to be useful in many, very different contexts (neural networks, image processing, wireless 
communications, etc.) and was unknowingly rediscovered several times. Its primary aim, as a new statistical tool to 
analyse large dimensional data sets, only became relevant in the last two decades, when the storage and handling of 
, humongous data sets became routine in almost all fields - physics, image analysis, genomics, epidemiology, engineering, 
^ , ' economics and finance, to quote only a few. It is indeed very natural to try to identify common causes (or factors) 
O " that explain the dynamics of N quantities. These quantities might be daily returns of the different stocks of the S&P 
500, monthly inflation of different sectors of activity, motion of individual grains in a packed granular medium, or 
different biological indicators (blood pressure, cholesterol, etc.) within a population, etc., etc. (for reviews of other 
applications and techniques, see 0, y, |3|) We will denote by T the total number of observations of each of the N 
I— i" quantities. In the example of stock returns, T is the total number of trading days in the sampled data; but in the 
f-H , biological example, T is the size of the population. The realization of the ith quantity (« = 1, . . . ,7V) at "time" t 
C/^ ■ (i = 1, . . . ,T) will be denoted r\, which will be assumed in the following to be demeaned and standardized. The 
, normalized T x N matrix of returns will be denoted as X: Xu = r^/VT. The simplest way to characterize the 
} ' correlations between these quantities is to compute the Pearson estimator of the correlation matrix: 
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where E will denote the empirical correlation matrix (i.e. on a given realization), that one must carefully distinguish 
from the "true" correlation matrix C of the underlying statistical process (that might not even exist). In fact, the 
, whole point of the Marcenko-Pastur result is to characterize the difference between E and C. Of course, if N is small 
I ' (say A'' = 4) and the number of observations is large (say T = 10^), then we can intuitively expect that any observable 
' computed using E will be very close to its "true" value, computed using C. For example, a consistent estimator of 
I TrC~ is given TrE-i when T is large enough for a fixed N. This is the usual limit considered in statistics. However, 
in many applications where T is large, the number of observables N is also large, such that the ratio q = N/T is not 
: very small compared to one. We will find below that when q is non zero, and for large TV, TrE~^ = TrC~"'^/(l — q). 
^ ■ Typical number in the case of stocks is iV = 500 and T — 2500, corresponding to 10 years of daily data, already quite 
. ^ , a long strand compared to the lifetime of stocks or the expected structural evolution time of markets. For inflation 
■ indicators, 20 years of monthly data produce a meager T = 240, whereas the number of sectors of activity for which 
H inflation is recorded is around = 30. The relevant mathematical limit to focus on in these cases is T ^ 1, iV ^ 1 
■ - - 1 but with q = N/T = 0(1). The aim of this paper is to review several Random Matrix Theory (RMT) results that can 
be established in this special asymptotic limit, where the empirical density of eigenvalues (the spectrum) is strongly 
distorted when compared to the 'true' density (corresponding io q ^ 0). When T ^ cx), iV — > oo, the spectrum 
has some degree of universality with respect to the distribution of the r*'s; this makes RMT results particularly 
appealing. Although the scope of these results is much broader (as alluded to above), we will gird our discussion to 
the applications of RMT to financial markets, a topic about which a considerable number of papers have been devoted 
to in the last decade (see e.g. 0, S 0, i, i, [IS [HI, [H, El [11, [H, IH, [13, [H, [H, S [U, HI, [ll [H, HI ) The following 
mini-review is intended to guide the reader through various results that we consider to be important, with no claim 
of being complete. We furthermore chose to state these results in a narrative style, rather than in a more rigorous 
Lemma- Theorem fashion. We provide references where more precise statements can be found. 



B. Principal Component Analysis 



The correlation matrix defined above is by construction slu N x N symmetric matrix, that can be diagonalized. 
This is the basis of the well known Principal Component Analysis (PC A), aiming at decomposing the fluctuations of 
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the quantity r* into decorrelated contributions (the 'components') of decreasing variance. In terms of the eigenvalues 
Xa and eigenvectors Va, the decomposition reads: 



N 
a=l 



where Va^i is the i-th component of Va, and are uncorrelated (for different a's) random variables of unit variance. 
Note that the are not necessarily uncorrelated in the "time" direction, and not necessarily Gaussian. This PCA 
decomposition is particularly useful when there is a strong separation between eigenvalues. For example if the largest 
eigenvalue Ai is much larger than all the others, a good approximation of the dynamics of the N variables reads: 



AiFi,,e*i, (3) 

in which case a single "factor" is enough to capture the phenomenon. When N is fixed and T ^ cx3, all the eigenvalues 
and their corresponding eigenvectors can be trusted to extract meaningful information. As we will review in detail 
below, this is not the case when q = N/T — 0(1), where only a subpart of the eigenvalue spectrum of the 'true' matrix 
C can be reliably estimated. In fact, since E is by construction a sum of T projectors, E has (generically) {N — T)+ 
eigenvalues exactly equal to zero, corresponding to the (N — r)+ dimensions not spanned by these T projectors. 
These zero eigenvalues are clearly spurious and do not correspond to anything real for C. 

It is useful to give early on a physical (or rather financial) interpretation of the eigenvectors Va ■ The list of numbers 
Va.i can be seen as the weights of the different stocks i = 1, . . . , TV in a certain portfolio Ha, where some stocks are 
'long' {Va,i > 0) while other are 'short' {Va,i < 0). The realized risk Ti^ of portfoho Ha, as measured by the variance 
of its returns, is given by: 




K^T^y Ay yo.,^rl\ = > ^ Va^,Va,,E,, ^ Xa . (4) 



The eigenvalue Aq. is therefore the risk of the investment in portfolio a. Large eigenvalues correspond to a risky mix 
of assets, whereas small eigenvalues correspond to a particularly quiet mix of assets. Typically, in stock markets, the 
largest eigenvalue corresponds to investing roughly equally on all stocks: = 1/^/N. This is called the 'market 
mode' and is strongly correlated with the market index. There is no diversification in this portfolio: the only bet 
is whether the market as a whole will go up or down, this is why the risk is large. Conversely, if two stocks move 
very tightly together (the canonical example would be Coca-cola and Pepsi), then buying one and selling the other 
leads to a portfolio that hardly moves, being only sensitive to events that strongly differentiate the two companies. 
Correspondingly, there is a small eigenvalue of E with eigenvector close to (0,0,..., a/2/2, 0, . . . , a/2/2, 0, . . . , 0, 0), 
where the non zero components are localized on the pair of stocks. 

A further property of the portfolios Ha is that their returns are uncorrelated, since: 




I = X! = ^aSa,l3- (5) 

ij 



The PCA of the correlation matrix therefore provides a list of 'eigenportfolios', corresponding to uncorrelated invest- 
ments with decreasing variance. 

We should mention at this stage an interesting duality that, although trivial from a mathematical point of view, 
looks at first rather counter-intuitive. Instead of the N x N correlation matrix of the stock returns, one could define 
a, T X T correlation matrix E of the daily returns, as: 

^"' = ^E-M'=^§XX- (6) 

i 

This measures how similar day t and day t' are, in terms of the 'pattern' created by the returns of the N stocks. The 
duality we are speaking about is that the non zero eigenvalues of E and of E arc precisely the same, up to a factor 
T/N. This is obvious from Eq. ([2]), where the Va,i and the play completely symmetric roles - the fact that the 
are uncorrelated for different a's means that these vectors of dimension T are orthogonal, as are the Va,i- Using this 
decomposition, one indeed finds: 

a 
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showing that the non zero eigenvalues of E are indeed Aq's (up to a factor \/q). The corresponding eigenvectors of 
E are simply the lists of the daily returns of the portfolios IIq. Of course, if T > A^, E has T — N additional zero 
eigenvalues. 



II. RETURN STATISTICS AND PORTFOLIO THEORY 



A. Single asset returns: a short review 



Quite far from the simple assumption of textbook mathematical finance, the returns (i.e. the relative price changes) 
of any kind of traded financial instrument (stocks, currencies, interest rates, commodities, etc. (63 |) are very far from 
Gaussian. The unconditional distribution of returns has fat tails, decaying as a power law for large arguments. In 
fact, the empirical probability distribution function of returns on shortish time scales (say between a few minutes and 
a few days) can be reasonably well fitted by a Student-t distribution (see e.g. [25.]):^65.] 

P(r) = — ^ 2 ) a 

where a is a parameter related to the variance of the distribution through cr"^ — a? /{^ — 2), and ^ is in the range 3 to 
5 [2^ . We assume here and in the following that the returns have zero mean, which is appropriate for short enough 
time scales: any long term drift is generally negligible compared to a for time scales up to a few weeks. 

This unconditional distribution can however be misleading, since returns are in fact very far from IID random 
variables. In other words, the returns cannot be thought of as independently drawn Student random variables. For 
one thing, such a model predicts that upon time aggregation, the distribution of returns is the convolution of Student 
distributions, which converges far too quickly towards a Gaussian distribution of returns for longer time scales. In 
intuitive terms, the volatility of financial returns is itself a dynamical variable, that changes over time with a broad 
distribution of characteristic frequencies. In more formal terms, the return at time t can be represented by the product 
of a volatility component cr* and a directional component ^* (see e.g. [25|): 

r*=aV, (9) 

where the ^* are IID random variables of unit variance, and cr* a positive random variable with both fast and slow 
components. It is to a large extent a matter of taste to choose ^* to be Gaussian and keep a high frequency, 
unpredictable part to tj*, or to choose ^* to be non-Gaussian (for example Student-t distributed [6^) and only keep 
the low frequency, predictable part of cr*. The slow part of cr* is found to be a long memory process, such that its 
correlation function decays as a slow power-law of the time lag r (see [H, and references therein): (67j 



cr*cr*+^ - CT^ cx r"'', (10) 

It is worth insisting that in Eq. ([5]), cr* and ^* are in fact not independent. It is indeed well documented that on stock 
markets negative past returns tend to increase future volatilities, and vice- versa [1^. This is called the 'leverage' 
effect, and means in particular that the average of quantities such as ^*cr*+'^ is negative when r > 0. 



B. Multivariate distribution of returns 



Having now specified the monovariate statistics of returns, we want to extend this description to the joint distri- 
bution of the returns of N correlated assets. We will first focus on the joint distribution of simultaneous returns 
{r* , . . . , r^}- Clearly, all marginals of this joint distribution must resemble the Student-t distribution ([8]) above; 
furthermore, it must be compatible with the (true) correlation matrix of the returns: 

= y" n [^^'=1 '^^^j- ^(^1' '^2, . . . , tn). ill) 

k 

Needless to say, these two requirements are weak constraints that can be fulfilled by the joint distribution 
P{ri, r2, . . . , rjv) in an infinite number of ways. This is referred to as the 'copula specification problem' in quantita- 
tive finance. A copula is a joint distribution of N random variables Ui that all have a uniform marginal distribution 
in [0, 1]; this can be transformed into -P(ri, r2, . . . , r^) by transforming each Ui into = F^^{ui), where Fi is the 
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(exact) cumulative marginal distribution of r^. The fact that the copula problem is hugely under-constrained has 
led to a proliferation of possible candidates for the structure of financial asset correlations (for a review, see e.g. 
[28l . [29l . [so . [33II). Unfortunately, the proposed copulas are often chosen because of mathematical convenience rather 
than based on a plausible underlying mechanism. From that point of view, many copulas appearing in the literature 
are in fact very unnatural. 

There is however a natural extension of the monovariate Student-t distribution that has a clear financial interpre- 
tation. If we generalize the above decomposition Eq. ^ as: 

rl = s.cT'^l (12) 

where the ^* are correlated Gaussian random variables with a correlation matrix Cy and the volatility u* is common 
to all assets and distributed as: 



^(-)-^cxp 



(13) 



where ctq = 2fi/ [ji ~ 2) in such a way that (cr^) = 1, such that Si is the volatility of the stock i. The joint distribution 
of returns is then a multivariate Student Ps that reads explicitly: 

r(^+^) 1 

P5(ri,r2,...,rjv) = / ' (14) 

m)\lil^^r det C (1 + i n(d~^),^r,^ 

where we have normalized returns so that Si = l. Let us list a few useful properties of this model: 



The marginal distribution of any is a monovariate Student-t distribution of parameter /i. 



• In the limit /i — > cx3, one can show that the multivariate Student distribution Pg tends towards a multivariate 
Gaussian distribution. This is expected, since in this limit, the random volatility a does not fluctuate anymore 
and is equal to 1. 



The correlation matrix of the is given, for /i > 2, by: 



C^j = (r.r,) = ^a,. (15) 

• Wick's theorem for Gaussian variables can be extended to Student variables. For example, one can show that: 

{riTjTkri) ^ I [C^jCki + CikCji + CuCjk] , (16) 

This shows explicitly that uncorrelated by Student variables are not independent. Indeed, even when Cy = 0, 
the correlation of squared returns is positive: 

{r>^) - {rl? = ^a.Q, > 0. (17) 

• Finally, note the matrix Cy can be estimated from empirical data using a maximum likelihood procedure. Given 
a time series of stock returns r\, the most likely matrix Cij is given by the solution of the following equation: 

^3 T ' I t in—w t ' 

t=l M + Z^mn '^ml'-^ jmn?'„ 

Note that in the Gaussian limit ^ — > 00 for a fixed iV, the denominator of the above expression is simply given 
by /i, and the final expression is simply: 

1 ^ 

C^^C,,=7^^,. (19) 
t=\ 

as it should be. 
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This multivariate Student model is in fact too simple to describe financial data since it assumes that there is a unique 
volatility factor, common to all assets. One expects that in reality several volatility factors are needed. However, the 
precise implementation of this idea and the resulting form of the multivariate distribution (and the corresponding 
natural copula) has not been worked out in details and is still very much a research topic. 

Before leaving this section, we should mention the role of the observation frequency, i.e. the time lag used to define 
price returns. Qualitatively, all the above discussion applies as soon as one can forget about price discretization effects 
(a few minutes on actively traded stocks) up to a few days, before a progressive 'gaussianization' of returns takes place. 
Quantitatively, however, some measurable evolution with the time lag can be observed. One important effect for our 
purpose here is the so-called Epps effect, i.e. the fact that the correlation of returns and Vj tends to increase with 
the time lag, quite strongly between 5 minutes and 30 minutes, then more slowly before apparently saturating after a 
few days [3ll . |32| | . A simple mechanism for such an increase (apart from artefacts coming from microstructural effects 
and stale prices) is pair trading. Imagine two stocks i and j known to be similar to each other (say, as mentioned 
above. Coca and Pepsi). Then the evolution of one stock, due to some idiosyncratic effect, is expected to drive the 
other through the impact of pair traders. One can write down a mathematical model for this, and compute the lag 
dependence of the returns, but it is quite clear that the time scale over which the correlation coefhcient converges 
towards its low frequency value is directly related to the strength of the pair trading effect. 

The Epps effect is very important since one might have hoped that increasing the frequency of observations allows 
one to have effectively longer samples of returns to estimate the correlation matrix, thereby increasing the quality 
factor Q — T/N alluded to in the introduction. One has to make sure, however, that the very object one wants to 
measure, i.e. the matrix Cij , does not actually change with the observation frequency. It seems that the Epps effect 
is nowadays weaker than in the past (say before year 2000) in the sense that the correlation matrix converges faster 
towards its low frequency limit. But as we discuss in the conclusion, there might be a lot to learn from a detailed 
analysis of the ultra high frequency behaviour of the correlation matrix. 



C. Risk and portfolio theory 

Suppose one builds a portfolio of N assets with weight Wi on the ith asset, with (daily) volatility s;. If one knew 
the 'true' correlation matrix Cij , one would have access to the (daily) variance of the portfolio return, given by: 

TZ'^ = ^^WiSiCijSjWj, (20) 

ij 

where Cij is the correlation matrix. If one has predicted gains gi, then the expected gain of the portfolio is = ^ Wigi. 

In order to measure and optimize the risk of this portfolio, one therefore has to come up with a reliable estimate 
of the correlation matrix Cij. This is difficult in general since one has to determine of the order of iV^/2 coefficients 
out of N time series of length T, and in general T is not much larger than N. As noted in the introduction, typical 
values of Q = T/N are in the range 1 ^ 10 in most applications. In the following we assume for simplicity that 
the volatilities are perfectly known (an improved estimate of the future volatility over some time horizon can be 
obtained using the information distilled by option markets). By redefining Wi as WiSi and gi as gi/si, one can set 
Sj = 1, which is our convention from now on. 

The risk of a portfolio with weights Wi constructed independently of the past realized returns r* is faithfully measured 
by: 



= (21) 



using the empirical correlation matrix E. This estimate is unbiased and the relative mean square-error one the risk is 
small {'^ 1/2^)- But when the w are chosen using the observed r's, as we show now, the result can be very different. 

Problems indeed arise when one wants to estimate the risk of an optimized portfolio, resulting from a Markowitz 
optimization scheme, which gives the portfolio with maximum expected return for a given risk or equivalently, the 
minimum risk for a given return Q (we will study the latter case below). Assuming C is known, simple calculations 
using Lagrange multipliers readily yield the optimal weights w* , which read, in matrix notation: 

(22) 



One sees that these optimal weights involve the inverse of the correlation matrix, which will be the source of problems, 
and will require a way to 'clean' the empirical correlation matrix. Let us explain why in details. 

The question is to estimate the risk of this optimized portfolio, and in particular to understand the biases of different 
possible estimates. We define the following three quantities [7|]: 



6 



• The "in-sample" risk, corresponding to the risk of the optimal portfolio over the period used to construct it, 
using E as the correlation matrix. 

7^f„=w^^Ew^, = -5|^ (23) 

• The "true" minimal risk, which is the risk of the optimized portfolio in the ideal world where C is perfectly 
known: 

^?rue = y^*JCw*c - -jSzT- (24) 

• The "out-of-sample" risk which is the risk of the portfolio constructed using E, but observed on the next 
(independent) period of time. The expected risk is then: 

7^L = wl,^Cw^ = (gTE-ig)2 (25) 

This last quantity is obviously the most important one in practice. 

If we assume that E is a noisy, but unbiased estimator of C, such that E = C, one can use a convexity argument for 
the inverse of positive definite matrices to show that in general: 



gTE-ig > g^C-ig (26) 
Hence for large matrices, for which the result is self-averaging: 

T^n < Kue- (27) 

By optimality, one clearly has: 

Kuc < ^ouf (28) 

These results show that the out-of-sample risk of an optimized portfolio is larger (and in practice, much larger, see 
section |VB] below) than the in-sample risk, which itself is an underestimate of the true minimal risk. This is a general 
situation: using past returns to optimize a strategy always leads to over-optimistic results because the optimization 
adapts to the particular realization of the noise, and is unstable in time. Using the Random Matrix results of the next 
sections, one can show that for IID returns, with an arbitrary "true" correlation matrix C, the risk of large portfolios 
obeys: [1^ 

7^in = 7^trucVl -9 = ^out(l " q) ■ (29) 

where q — N/T = 1/Q. The out-of-sample risk is therefore l/^/T~~q times larger than the true risk, while the in 
sample risk is \/l — q smaller than the true risk. This is a typical data snooping effect. Only in the limit g — *■ will 
these risks coincide, which is expected since in this case the measurement noise disappears, and E = C. In the limit 
q — > 1, on the other hand, the in-sample risk becomes zero since it becomes possible to find eigenvectors (portfolios) 
with exactly zero eigenvalues, i.e., zero in sample risk. The underestimation of the risk turns out to be even stronger 
in the case of a multivariate Student model for returns Q . In any case, the optimal determination of the correlation 
matrix based on empirical should be such that the ratio 7?-true/''^out < 1 is as large as possible. 

In order to get some general intuition on how the Markowitz optimal portfolio might not be optimal at all, let us 
rewrite the solution Eq. (|22p above in terms of eigenvalues and eigenvectors: 

w* cx ^ A-iy„,,I4,,5, =9^+Y. (A-1 - 1) (30) 

aj aj 

The first term corresponds to the naive solution: one should invest proportionally to the expected gain (in units 
where Si — 1). The correction term means that the weights of eigenvectors with Aq, > 1 must be reduced, whereas the 
weights of eigenvectors with Aq, < 1 should be enhanced. The optimal Markowitz solution may allocate a substantial 
weight to small eigenvalues, which may be entirely dominated by measurement noise and hence unstable. There 
several ways to clean the correlation matrix such as to tame these spurious small risk portfolios, in particular based 
on Random Matrix Theory ideas. We will come back to this point in Sect. IV Bl 
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D. Non equal time correlations and more general rectangular correlation matrices 

The equal time correlation matrix is clearly important for risk purposes, and also the understand the structure 
of the market, or more generally the 'Principle Components' driving the process under consideration. A natural 
extension, very useful for prediction purposes, is to study a lagged correlation matrix between past and future returns. 
Let us define Cij (t) as: 

C^,{t) = {r\rf^) (31) 

such that Cij(T = 0) = Cij is the standard correlation coefficient. Whereas Cij is clearly a symmetric matrix, 
Cij{T > 0) is in general non symmetric, and only obeys Cij{T) = Cji{—T). How does one extend the idea of 'Principle 
Components', seemingly associated to the diagonalisation of Cij, to these assymetric case? 

The most general case looks in fact even worse: one could very well measure the correlation between N 'input' 
variables Xi, i = I, N and M 'output' variables Ya, a — 1, M. The X and the F's may be completely different 
from one another (for example, X could be production indicators and Y inflation indexes), or, as in the above example 
the same set of observables but observed at different times: N = M, X* = r| and Y* = r*+'^. The cross-correlations 
between X's and F's is characterized by a rectangular N x M matrix C defined as: 

= (X^Ya) (32) 

(we assume that both X^s and Y^s have zero mean and variance unity). If there is a total of T observations, where 
both Xf and Y^*, t = 1, T are observed, the empirical estimate of C is, after standardizing X and Y: 

1 ^ 

S.a = j;J2^'Ya- (33) 
t=i 

What can be said about these rectangular, non symmetric correlation matrices? The singular value decomposition 
(SVD) answers the question in the following sense: what is the (normalized) linear combination of X's on the one 
hand, and of Y's on the other hand, that have the strongest mutual correlation? In other words, what is the best 
pair of predictor and predicted variables, given the data? The largest singular value Cmax and its corresponding left 
and right eigenvectors answer precisely this question: the eigenvectors tell us how to construct these optimal linear 
combinations, and the associated singular value gives us the strength of the cross-correlation: < Cmax ^ 1- One can 
now restrict both the input and output spaces to the iV — 1 and A/ — 1 dimensional sub-spaces orthogonal to the two 
eigenvectors, and repeat the operation. The list of singular values Cq gives the prediction power, in decreasing order, 
of the corresponding linear combinations. This is called "Canonical Component Analysis" (CCA) in the literature 
[4^; surprisingly in view of its wide range of applications, this method of investigation has be somewhat neglected 
since it was first introduced in 1936 [s^. 

How to get these singular values and the associated left and right eigenvectors? The trick is to consider the N x N 
matrix CC , which is now symmetric and has N non negative eigenvalues, each of which being equal to the square of 
a singular value of C itself. The eigenvectors give us the weights of the linear combination of the X's that construct 
the 'best' predictors in the above sense. One then forms the M x M matrix C^C that has exactly the same non zero 
eigenvalues as CC^ \ the corresponding eigenvectors now give us the weights of the linear combination of the l^'s that 
construct the 'best' predictees. If M > N, C^C has M — N additional zero eigenvalues; whereas when M < N it is CC^ 
that has an excess of iV — M zero eigenvalues. The list of the non zero eigenvalues, c^^y_ = > C2 > . . . > c^^ 
gives a sense of the predictive power of the AT's on the behaviour of the Y^s. However, as for standard correlation 
matrices, the empirical determination of C is often strewn with measurement noise and RMT will help sorting out 
grain from chaff, i.e. what is true information (the "grain") in the spectrum of C and what is presumably the "chaff". 

III. RANDOM MATRIX THEORY: THE BULK 
A. Preliminaries 

Random Matrix Theory (RMT) attempts to make statements about the statistics of the eigenvalues Aq of large 
random matrices, in particular the density of eigenvalues p(A), defined as: 



Pn{\)^^Y.^^^-^o.), (34) 



a=l 
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where Aq are the eigenvalues of the N x N symmetric matrix H that belongs to the statistical ensemble under scrutiny. 
It is customary to introduce the the resolvent Gh{z) of H (also called the Stieltjes transform), where z is a complex 
number: 

GH(^) = ^Tr[(zI-H)-i] , (35) 
from which one can extract the spectrum as: 

PN{X)^\im-^{GH{\-y)). (36) 

In the limit where N tends to infinity, it often (but not always) happens that the density of eigenvalues pN tends 
almost surely to a unique well defined density Poo(A). This means that the density pjv becomes independent of the 
specific realization of the matrix H, provided H is a 'typical' sample within its ensemble. This property, called 
'ergodicity' or 'self-averaging', is extremely important for practical applications since the asymptotic result PooW can 
be used describe the eigenvalue density of a single instance. This is clearly one of the key of the success of RMT. 

Several other 'transforms', beyond the resolvent G(z), turn out to be useful for our purposes. One is the so-called 
'Blue function' B{z), which is the functional inverse of G{z), i.e.: B[G{z)] = G[B{z)] = z. The R-transform is simply 
related to the Blue function through 0]: 

R{z) B{z) - z-\ (37) 

It is a simple exercise to show that R{z) obeys the following property: 

RaH{z) - aRHiaz) (38) 

where a is an arbitrary real number. Furthermore, R{z) can be expanded for large z as R{z) — X]fc°=i '^fe^'^^^j 
where the coefficients Ck can be thought of as cumulants (see below). For example, ci — J d\\p{X). When ci — 0, 

C2^JdX\^p{\). 

The last object that we will need is more cumbersome. It is called the S-transform and is defined as follows 

S{z) = -^-^f]-\l + z) where T]{y) = --G ( -] . (39) 

z y \yj 

In the following, we will review several RMT results on the bulk density of states poo (A) that can be obtained using 
an amazingly efficient concept: matrix freeness (ssj . The various fancy transforms introduced above will then appear 
more natural. 



B. Free Matrices 



Freeness is the generalization to matrices of the concept of independence for random variables. Loosely speaking, 
two matrices A and B are mutually free if their eigenbasis are related to one another by a random rotation, or said 
differently if the eigenvectors of A and B are almost surely orthogonal. A more precise and comprehensive definition 
can be found in, e.g. 0, but our simplified definition, and the following examples, will be sufficient for our purposes. 

Let us give two important examples of mutually free matrices. The first one is nearly trivial. Take two fixed 
matrices A and B, and choose a certain rotation matrix O within the orthogonal group 0{N), uniformly over the 
Haar measure. Then A and O^BO are mutually free. The second is more interesting, and still not very esoteric. Take 
two matrices Hi and H2 chosen independently within the GOE ensemble, i.e. the ensemble symmetric matrices such 
that all entries are IID Gaussian variables. Since the measure of this ensemble of random matrices is invariant under 
orthogonal transformation, it means that the rotation matrix Oi diagonalizing Hi is a random rotation matrix over 
0{N) (this is actually a convenient numerical method to generate random rotation matrices). The rotation OJO2 
from the eigenbasis of Hi to that of H2 is therefore also random, and Hi and H2 are mutually free. More examples 
will be encountered below. 

Now, matrix freeness allows one to compute the spectrum of the sum of matrices, knowing the spectrum of each of 
the matrices, provided they are mutually free. More precisely, if Ra{z) and Rb{z) are the R-transforms of two free 
matrices A and B, then: 



Ra+b{z) = Ra{z)+Rb{z) 



(40) 
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This result clearly generalizes the convolution rule for sum of two independent random variables, for which the 
logarithm of the characteristic function is additive. Once Ra+b{z) is known, one can in principle invert the R- 
transform to reach the eigenvalue density of A + B 

There is an analogous result for the product of non negative random matrices. In this case, the S-transform is 
multiplicative: 

Sa+b{z) - Sa{z)Sb{z) (41) 

In the rest of this section, we will show how these powerful rules allows one to establish very easily several well 
known eigenvalue densities for large matrices, as well as some newer results. 



C. Application: Wigner and Marcenko &: Pastur 

Let us start with the celebrated Wigner semi-circle for Gaussian Orthogonal matrices. As stated above, two such 
matrices Hi and H2 are mutually free. Furthermore, because of the stability of Gaussian variables under addition, 
(Hi + H2)/\/2 is in the same ensemble. One therefore has: 

R^h(z) = Rh,+h,{z) = Rh,{z) + Rh,{z) - 2Rh{z) (42) 

Using the result Eq. ([38)1 above with a = \/2, one finds that R{z) must obey: 

2Rh{z) = V2Rh{V2z) Rh{z) = z (43) 

where we have assumed the standard normalization TrH^ — 1. One can check easily that i?(z) = z is indeed the 
R-transform of Wigner semi-circle. There is another nice Central Limit Theorem-like way of establishing this result. 
Suppose Hi, i = 1, . . . , A/" are 'small' traceless random matrices, such that each element has a variance equal to 
with e 0. Expand their resolvent Gi{z) in 1/z: 



G(z) = 1+ + £^1 + 0{e^lz^) ^ 1 « G - e^G 
z z-^ z 



Hence, 



~ ^ R{z) = Biz) - 1 « e^z + OU^z^) 

z — e^z-' z 



Now if these M matrices are mutually free, with t = M and M 00, then the R-transform of the sum of such 
matrices is: 

R{z) = Me^z + 0{M€^z'') ^AA^oo z. 

Therefore the sum of A/" 'small' centered matrices has a Wigner spectrum in the large A/" limit, with computable 
corrections. 

The next example is to consider empirical correlation in the case where the true correlation matrix is the identity: 
C = I. Then, E is by definition the sum of rank one matrices = (r*r*)/r, where r* are independent, unit 

variance random variables. Hence, 5E* has one eigenvalue equal to q (for large N) associated with direction r', and 
A'^ — 1 zero eigenvalues corresponding to the hyperplane perpendicular to r* . The different i5E* are therefore mutually 
free and one can use the R-transform trick. Since: 

+ (44, 

N \z — q z J 

Inverting 5G{z) to first order in the elementary Blue transform reads: 

^B{z) = - + r ^ 5R{z) = -. (45) 

^ ' z N{\ - qz) ^ ' N{\ - qz) ^ ' 

Using the addition of R-transforms, one then deduces: 



(46) 
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FIG. 1: Marcenko & Pastur spectrum for Q = T/N = 3.45 (dotted line) compared to the spectrum of the exponentially 
weighted moving average correlation random matrix with q = Ne = 1/2 (plain line). 



which reproduces the well known Marcenko & Pastur result for the density of eigenvalues (for g < 1) [l|: 



P.(A) = ^^^^^^, xe[{i-v-qr,{i+v-^n (47) 

This distribution is plotted in Fig. 1 for Q — 1/q = 3.45. The remarkable feature of this result is that there should be 
no eigenvalue outside the interval [(1 — y/q)'^, (1 + a/?)^] when ^ oo. One can check that pe{^) converges towards 
5(\ — 1) when q = l/Q — > 0, or T 3> A^. When q> \, we know that some zero eigenvalues necessarily appear in the 
spectrum, which then reads: 



Using Ge{z), it is straightforward to show that (l/Af)TrE^^ = ~Ge{Q) is given by (1 — q)^^ for q < 1. This was 
alluded to in Sect. Ill CI above. The Marcenko-Pastur is important because of its large degree of universality: as for 
the Wigner semi-circle, its holds whenever the random variables r* are IID with a finite second moment (but see Sect. 
nil Dl below for other 'universality classes'). [1^ 

Consider now the case where the empirical matrix is computed using an exponentially weighted moving average 
(still with C = I). Such an estimate is standard practice in finance. More precisely: 

t-i 

E.,^e J2 (l-e)*'*'rfr*' (49) 

t' — — oo 

with e < 1. Now, as an ensemble Eij satisfies Eij = (1 — e)Eij + er^r'j. We again invert the resolvent of Eo to find 
the elementary Blue transform, 

Bo(z) = i+i?o(^) with Raiz) = ^^^^^^^ (50) 

where now q — Ne. Using again Eq. ([55)) . we then find for R{z), to first order in 1/A^: 

i?(,) + ,R'^,) + ^ = ^ Riz) = (51) 
1 — qz qz 

Going back to the resolvent to find the density, we finally get [36j : 

p(A) = -3G(A) where G(A) solves XqG ^ g - log(l - qG) (52) 

TT 

This solution is compared to the standard Wishart distribution in Fig 1. 
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A nice property of the Blue functions is that they can be used to find the edges of the eigenvalue spectrum (A±). 
One has:|37| 

A±=B(z±) where B'{z±)^0 (53) 

In the case at hand, by evaluating B{z) when B'{z) = we can write directly an equation whose solutions are the 
spectrum edges (A±) 

A± =log(A±) + g+l (54) 

When q is zero, the spectrum is again 5{\ — 1) as expected. But as the noise increases (or the characteristic time 
decreases) the lower edge approach zero very quickly as A_ ~ exp(— 1/Q). Although there are no exact zero eigenvalues 
for these matrices, the smallest eigenvalue is exponentially close to zero when Q ^ 0, i.e. N ^ T. 



D. More applications 

1, The case of an arbitrary true correlation matrix 

In general, the random variables under consideration are described by 'true' correlation matrix C with some non 
trivial structure, different from the identity matrix 1. Interestingly, the Marcenko-Pastur result for the spectrum of 
the empirical matrix E can be extended to a rather general C, and opens the way to characterize the true spectrum 
PC even with partial information Q — T/N < oo. However, for a general C, the different projectors r*r* cannot be 
assumed to define uncorrelated directions for different t, even if the random variables r* are uncorrelated in time and 
the above trick based on R-transforms cannot be used. However, assuming that the r* are Gaussian, the empirical 
matrix E can always be written as C^/^X[C^/^X]-^, where X is an x T rectangular matrix of uncorrelated, unit 
variance Gaussian random variables. But since the eigenvalues of C^/^X[C^/^X]^ are the same as those of CXX-^, 
we can use the S-transform trick mentioned above, with A = C and B = XX"^ mutually free, and where the spectrum 
of B is by construction given by the Marcenko-Pastur law. This allows one to write down the following self-consistent 
for the resolvent of E: [69l| [igI. Issjl 

Ge{z) = f dXpciX) \ ^ . .. , (55) 

J z - \[l - q + qzG e{z)) 

a result that in fact already appears in the original Marcenko-Pastur paper! One can check that if pc{^) = <5(A — 1), 
one recovers the result given by Eq. (|46p . Equivalently, the above relation can be written as: 

zGe{z) = ZGciZ) where \ (56) 

1 + q[zGE(z) - 1) 

which is convenient for numerical evaluation ^16*1. From these equations, one can evaluate —Ge{0) = TrE~^, which 
is found to be equal to TrC~^/(l — q), as we mentioned in the introduction, and used to derive Eq. above. 

Note that while the mapping between the true spectrum pc and the empirical spectrum is numerically stable, 
the inverse mapping is unstable, a little bit like the inversion of a Laplace transform. In order to reconstruct the 
spectrum of C from that of E one should therefore use a parametric ansatz of pc to fit the observed pE, and not try 
to invert directly the above mapping (for more on this, see [itI. [39|). 

Note also that the above result does not apply when C has isolated eigenvalues, and only describes continuous 
parts of the spectrum. For example, if one considers a matrix C with one large eigenvalue that is separated from the 
'Wishart sea', the statistics of this isolated eigenvalue has recently been shown to be Gaussian [i^ (see also below), 
with a width ^ T~^/^, much smaller than the uncertainty on the bulk eigenvalues (~ q^^"^)- A naive application of 
Eq. (|56p . on the other hand, would give birth to a 'mini- Wishart' distribution around the top eigenvalue. This would 
be the exact result only if the top eigenvalue of C had a degeneracy proportional to N. 



2. The Student ensemble case 

Suppose now that the r* are chosen according to the Student multivariate distribution described in Sect. Ill B I above. 
Since in this case r* = (Jt^*, the empirical correlation matrix can be written as: 



t 



(57) 
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In the case where C = 1, this can again be seen as a sum of mutually free projectors, and one can use the R-transform 
trick. This allows one to recover the following equation for the resolvent of E, first obtained in the Marccnko-Pastur 
paper and exact in the large iV, T limit: 

^ = -Fo + / dsPisj- ——-5 TT-^ (58) 

- p [ ^„ , + / dsP{s)j ^ ) , (59) 

where Gr is the real part of the resolvent, and P{s) = s'^/^~^e~*/r(///2) is the distribution of s = /J./cr^ in the case 
of a Student distribution; however, the above result holds for other distributions of a as well, corresponding to the 
class of "elliptic" multivariate distributions. The salient results are (i) there is no longer any upper edge of the 
spectrum: p_e(A) ~ when A ^ 00; (ii) but there is a lower edge to the spectrum for all p. The case C 7^ 1 

can also be treated using S-transforms. 

Instead of the usual (Pearson) estimate of the correlation matrix, one could use the maximum likelihood procedure, 
Eq. (jlSp above. Surprisingly at first sight, the corresponding spectrum pml(A) is then completely different 9], and 
is given by the standard Marcenko-Pastur result! The intuitive reason is that the maximum likelihood estimator 
Eq. p8)) effectively renormalizes the returns by the daily volatility at when at is large. Therefore, all the anomalies 
brought about by 'heavy days' (i.e. at 3> ctq) disappear. 

Finally, we should mention that another Student Random-Matrix ensemble has been considered in the recent 
literature, where instead of having a time dependent volatility at, it is the global volatility a that is random, and 
distributed according to Eq. ([13]) [U 0, |4l[. The density of states is then simply obtained by averaging over 
Marcenko-Pastur distributions of varying width. Note however that in this case the density of states is not self- 
averaging: each matrix realization in this ensemble will lead to a Marcenko-Pastur spectrum, albeit with a random 
width. 

E. Random SVD 

As we mentioned in Sect. Ill Dl it is often interesting to consider non-symmetrical, or even rectangular correlation 
matrices, between N 'input' variables X and M 'output' variables Y . The empirical correlation matrix using T-long 
times series is defined by Eq. ([55]) . What can be said about the singular value spectrum of E in the special limit 
N,M,T 00, with n — N/T and m = M/T fixed? Whereas the natural null hypothesis for correlation matrices is 
C = 1, that leads to the Marcenko-Pastur density, the null hypothesis for cross-correlations between a priori unrelated 
sets of input and output variables is C = 0. However, in the general case, input and output variables can very well be 
correlated between themselves, for example if one chooses redundant input variables. In order to establish a universal 
result, one should therefore consider the exact normalized principal components for the sample variables X's and Y^s: 

K = 4^Y.^o.,^Xl■. (60) 

and similarly for the Y^. The Aq and the Va^i are the eigenvalues and eigenvectors of the sample correlation matrix 
Ex (or, respectively EyV We now define the normalized M x N cross-correlation matrix as £ = YX^ . One can then 
use the following tricks 

• The non zero eigenvalues of £^£ are the same as those of X^XY^Y 

• A = X'^X and B = Y^Y are two mutually free T x T matrices, with N (M) eigenvalues exactly equal to 1 
(due to the very construction of X and Y), and (T - N)+ ((T - A/)+) equal to 0. 

• The S-transforms are multiplicative, allowing one to obtain the spectrum of AB. 

Due to the simplicity of the spectra of A and B, the calculation of S-transforms is particularly easy Q. The final 
result for the density of singular values (i.e, the square-root of the eigenvalues of AB) reads (see 42| for an early 
derivation of this result, see also [iH): 

p(c) = max(l - n, 1 - m)S(c) + max(m + n - 1, 0)5(c - 1) + ^— ' '^'Z (61) 

7rc(l — c^) 
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where n = N/T, m = M/T and 7± are given by: 

2mn ± 2^mn[\ - n)(l - m), < 7± < 1 (62) 

The allowed c's are all between and 1, as they should since these singular values can be interpreted as correlation 
coefficients. In the limit T ^ oo at fixed iV, M, all singular values collapse to zero, as they should since there is no 
true correlations between X and the allowed band in the limit n, m — > becomes: 

c€ 7= , 63 

showing that for fixed N, M, the order of magnitude of allowed singular values decays as T~^/^. 

Note that one could have considered a different benchmark ensemble, where one considers two independent vector 
time series X and Y with true correlation matrices Cx and Cy equal to 1. The direct SVD spectrum in that case 
can also be computed as the S-convolution of two Marcenko-Pastur distributions with parameters m and n [8], This 
alternative benchmark is however not well suited in practice, since it mixes up the possibly non trivial correlation 
structure of the input variables and of the output variables themselves with the cross-correlations between these 
variables. 

As an example of applications to economic time series, we have studied in the cross correlations between 76 
different macroeconomic indicators (industrial production, retail sales, new orders and inventory indices of all economic 
activity sectors available, etc.) and 34 indicators of inflation, the Composite Price Indices (CPIs), concerning different 
sectors of activity during the period June 1983- July 2005, corresponding to 265 observations of monthly data. The 
result is that only one, or perhaps two singular values emerge from the above "noise band" . From an econometric 
point of view, this is somewhat disappointing: there seems to be very little exploitable signal in spite of the quantity 
of available observations. 



F. A Note on "Levy" (or heavy tailed) matrices |44| 

All the above results for the bulk part of the spectrum of random matrices are to a large extent universal with 
respect to the distribution of the matrix elements. Although many of these results are easy to obtain assuming that 
the random variables involved in their construction are Gaussian, this is not a crucial assumption. For example, the 
Wigner semi-circle distribution holds for any large symmetric matrices made up of IID elements, provided these have 
a finite second moment. 

The results are however expected to change when the tail of the distribution of these elements are so heavy that the 
second moment diverges, corresponding to a tail index /i less than 2. The generalization of the Wigner distribution 
in that case (called Levy matrices, because the corresponding ensemble is stable under addition) was worked out in 
[4^ using heuristic methods [44!|. Their result on p{X) was recently rigorously proven in [4^. The remarkable feature 
is that the support of the spectrum becomes unbounded; actually p{X) decays for large A with the exact same tail as 
that of the distribution of individual elements. 

It is worth noticing that although Levy matrices are by construction stable under addition, two such Levy matrices 
are not mutually free. The problem comes in particular from the large eigenvalues just mentioned; the corresponding 
eigenvectors are close to one of the canonical basis vector. Therefore one cannot assume that the eigenbasis differ by 
a random rotation. A different ensemble can however be constructed, where each Levy matrix is randomly rotated 
before being summed (see [44*]). In this case, freeness is imposed by hand and R-transforms are additive. The 
corresponding fixed point generalizing R{z) = z in the Wigner case is then R{z) = z^~^. The eigenvalue spectrum is 
however different from the one obtained in [l^H^, although the asymptotic tails are the same: p{X) cx A^^^^. 

Finally, the generalization of the Marcenko-Pastur result for heavy tailed matrices is also a very recent achievement 
[131 . Again, the spectrum loses both its upper and lower sharp edges for all finite values of Q = T/N as soon as the 
variance of the random variables r| diverges, i.e. when /i < 2. Note that the resulting spectrum is distinct from the 
Student ensemble result obtained above, the latter is different from Marcenko-Pastur for all /i < +00. However, when 
/X < 2, they both share the same power-law tail which is now: p{\) cx \~^~'^/'^. 



IV. RANDOM MATRIX THEORY: THE EDGES 



A. The Tracy- Widom region 



As we alluded to several times, the practical usefulness of the above predictions for the eigenvalue spectra of 
random matrices is (i) their universality with respect to the distribution of the underlying random variables and (ii) 
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the appearance of sharp edges in the spectrum, meaning that the existence of eigenvalues lying outside the allowed 
band is a strong indication against several null hypothesis benchmarks. 

However, the above statements are only true in the asymptotic, N,T ^ oo limit. For large but finite N one 
expects that the probability to find an eigenvalue is very small but finite. The width of the transition region, and the 
tail of the density of states was understood a while ago, culminating in the beautiful results by Tracy & Widom on 
the distribution of the largest eigenvalue of a random matrix. There is now a huge literature on this topic (see e.g. 

lisl . [49I [sol . IsH ) that we will not attempt to cover here in details. We will only extract a few interesting results 
for applications. 

The behaviour of the width of the transition region can be understood using a simple heuristic argument. Suppose 
that the N = 00 density goes to zero near the upper edge A+ as (A+ — A)^ (generically, = 1/2 as is the case for the 
Wigner and the Marcenko-Pastur distributions). For finite N, one expects not to be able to resolve the density when 
the probability to observe an eigenvalue is smaller than 1/A^. This criterion reads: 

{X+-X*{N)y+^ (x^^AX* ^N-rh, (64) 

or a transition region that goes to zero as N~'^/^ in the generic case. More precisely, for Gaussian ensembles, the 
average density of states at a distance 7V~^/^ from the edge behaves as: 

PAr(A w A+) = Ar-i/3$ riv2/3(A - A+)j , (65) 

with <^{x — > —00) cx ^—x as to recover the asymptotic density of states, and ln$(a; — > +00) cx x^^^, showing that the 
probability to find an eigenvalue outside of the allowed band decays exponentially with N and super exponentially 
with the distance to the edge. 

A more precise result concerns the distance between the largest eigenvalue Amax of a random matrix and the upper 
edge of the spectrum A4.. The Tracy- Widom result is that for a large class of N x N matrices (e.g. symmetric random 
matrices with IID elements with a finite fourth moment, or empirical correlation matrices of IID random variables 
with a finite fourth moment), the rescaled distribution of Amax — A* converges towards the Tracy- Widom distribution, 
usually noted Fi: 

^-2/3, 



7iV"^/^uj = Fi(u), (66) 

where 7 is a constant that depends on the problem. For example, for the Wigner problem, A+ = 2 and 7=1; whereas 
for the Marcenko-Pastur problem, A+ = (1 + y^)^ and 7 = ^X^"^. 

Everything is known about the Tracy- Widom density fi{u) = F[{u), in particular its left and right far tails: 

ln/i(u) (X (u^+oo); In fi{u) (x -\u\^ , {u ^ -00); (67) 

Not surprisingly, the right tail is the same as that of the density of states $. The left tail is much thinner: pushing 
the largest eigenvalue inside the allowed band implies compressing the whole Coulomb-Dyson gas of charges, which 
is difficult. Usin g th is analogy, the large deviation regime of the Tracy- Widom problem (i.e. for Amax — ^+ = 0(1)) 
can be obtained (S^l . 

Note that the distribution of the smallest eigenvalue A,nin around the lower edge A_ is also Tracy- Widom, except 
in the particular case of Marcenko-Pastur matrices with Q = 1. In this case, A_ = which is a 'hard' edge since all 
eigenvalues of the empirical matrix must be non-negative. This special case is treated in, e.g. !50j. 

Finally, the distance of the largest singular value from the edge of the random SVD spectrum, Eq. (pT|) above, is 
also governed by a Tracy- Widom distribution, with parameters discussed in details in [53| . 



B. The case with large, isolated eigenvalues and condensation transition 

The Wigner and Marcenko-Pastur ensembles are in some sense maximally random: no prior information on the 
structure of the matrices is assumed. For applications, however, this is not necessarily a good starting point. In the 
example of stock markets, it is intuitive that all stocks are sensitive to global news about the economy, for example. 
This means that there is at least one common factor to all stocks, or else that the correlation coefficient averaged 
over all pairs of stocks, is positive. A more reasonable null- hypothesis is that the true correlation matrix is: Cu = 1, 
Cij = p, Vz ^ j. This essentially amounts to adding to the empirical correlation matrix a rank one perturbation 
matrix with one large eigenvalue N]), and iV — 1 zero eigenvalues. When Np 3> 1, the empirical correlation matrix 
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will obviously also have a large eigenvalue close to iVp, very far above the Marcenko-Pastur upper edge A+. What 
happens when N'p is not very large compared to unity? 

This problem was solved in great details by Baik, Ben Arous and Peche [4§|, who considered the more general 
case where the true correlation matrix has k special eigenvalues, called "spikes" . A similar problem arises when one 
considers Wigner matrices, to which one adds a perturbation matrix of rank k. For example, if the random elements 
Hij have a non zero mean h, the problem is clearly of that form: the perturbation has one non zero eigenvalue 
Nh, and iV — 1 zero eigenvalues. As we discuss now using free random matrix techniques, this problem has a sharp 
phase transition between a regime where this rank one perturbation is weak and is "dissolved" in the Wigner sea, 
and a regime where this perturbation is strong enough to escape from the Wigner sea. This transition corresponds 
to a "condensation" of the eigenvector corresponding to the largest eigenvalue onto the eigenvalue of the rank one 
perturbation. 

Let us be more precise using R-transform techniques for the Wigner problem. Assume that the non zero eigenvalue 
of the rank one perturbation is A, with a corresponding eigenvector ei = (1, 0, . . . , 0). The resolvent Ga and the Blue 
function of this perturbation is: 

Such a perturbation is free with respect to Wigner matrices. The R-transform of the sum is therefore given by: 

which allows to compute the corrected resolvent G. The correction term is of order 1/iV, and one can substitute G 
by the Wigner resolvent Gw to first order. This correction can only survive in the large N limit if A x Gw{z) = 1 
has a non trivial solution, such that the divergence compensates the 1/N factor. The corresponding value of z then 
defines an isolated eigenvalue. This criterion leads to |54l. ISSj: 

z = A„,ax = ^+J (A > 1); A,„ax = 2 (A < 1) (70) 

Therefore, the largest eigenvalue pops out of the Wigner sea precisely when A = 1. The statistics of the largest 
eigenvalue Amax is still Tracy- Widom whenever A < 1 , but becomes Gaussian, of width N^^^^ (and not N~^^^) when 
A > 1. The case A = 1 is special and is treated in [43|. Using simple perturbation theory, one can also compute the 
overlap between the largest eigenvector Knax and ei [s^: 

(Knax • ei)2 ^ 1 - A-2, (A>1), (71) 

showing that indeed, the coherence between the largest eigenvector and the perturbation becomes progressively lost 
when A 1+. 

A similar phenomenon takes place for correlation matrices. For a rank one perturbation of the type described 
above, with an eigenvalue A — N the criterion for expelling an isolated eigenvalue from the Marcenko-Pastur sea 
now reads (49| : 

(A>l + Vg); A,nax - (1 + Vg)' (A < 1 + V?) (72) 



A- 1 



Note that in the limit A ^ cxd, Amax ~ K + q + 0{K^^). For rank k perturbation, all eigenvalues such that A^ > 14- y^, 
1 < r < fc will end up isolated above the Marcenko-Pastur sea, all others disappear below A+. All these isolated 
eigenvalues have Gaussian fluctuations of order T^^/^ (see also Sect. IIVDI below). For more details about these 
results, see 



C. The largest eigenvalue problem for heavy tailed matrices 

The Tracy- Widom result for the largest eigenvalue was first shown for the Gaussian Orthogonal ensemble, but it 
was soon understood that the result is more general. In fact, if the matrix elements are IID with a finite fourth 
moment, the largest eigenvalue statistics is asymptotically governed by the Tracy- Widom mechanism. Let us give a 
few heuristic arguments for this [55*1 . Suppose the matrix elements are IID with a power-law distribution: 

P{H) ^\H\^^ —— with A ^ 0{1/^). (73) 
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and fi > 2, such that the asymptotic eigenvalue spectrum is the Wigner semi-circle with X± = ±2. The largest 
element -ffmax of the matrix (out of N'^/2) is therefore of order iV^/''~^/^ and distributed with a Frechet law. From 
the results of the previous subsection, one can therefore expect that: 

• If ^ > 4: i?niax ^ 1, and one recover Tracy- Widom. 

• If 2 < ^ < 4: i/max ^ 1, Ainax ~ -f^max oc f ^ , with a Frcchet distribution. Note that although Amax ^ oo 
when N ^ oo, the density itself goes to zero when A > 2 in the same limit. 

• If /i = 4: i?max ~ 0(1), Amax = 2 Or Amax — ^max + V^^max, Corresponding to a non-universal distribution for 
Amax with a (5-peak at 2 and a transformed Frcchet distribution for Amax > 2. 

Although the above results are expected to hold for oo (a rigorous proof can be found in [HBl), one should 

note that there are very strong finite size corrections. In particular, although for /i > 4 the asymptotic limit is 
Tracy- Widom, for any finite N the distribution of the largest eigenvalue has power-law tails that completely mask 
the Tracy- Widom distribution - see . Similarly, the convergence towards the Frechet distribution for < 4 is also 
very slow. 



D. Dynamics of the top eigenvector — theory 

As mentioned above and discussed in fuller details in the next section, financial covariance matrices are such that a 
few large eigenvalues are well separated from the 'bulk', where all other eigenvalues reside. We have indeed seen that 
if stocks tend to be correlated on average, a large eigenvalue Amax ~ A^p will emerge. The associated eigenvector is 
the so-called 'market mode': in a first approximation, all stocks move together, up or down. 

A natural question, of great importance for portfolio management is whether Amax and the corresponding Vmax 
are stable in time. Of course, the largest eigenvalue and eigenvector of the empirical correlation matrix are affected 
by measurement noise. Can one make predictions about the fluctuations of both the largest eigenvalue and the 
corresponding eigenvector induced by measurement noise? This would help separating a true evolution in time of the 
average stock correlation and of the market exposure of each stock from one simply related to measurement noise. 
Such a decomposition seems indeed possible in the limit where Amax 3> A^. 

Suppose that the true covariance matrix C is time independent with one large eigenvalue Ai associated to the 
normalized eigenvector Vi. Assuming that the covariance matrix Ef is measured through an exponential moving 
average of the returns, Eq. (|49|) . with an averaging time 1/e, one can write down, in the limit e — ^ and for Gaussian 
returns, two Ornstein-Uhlenbeck like equations for the largest eigenvalue of Et, Ak , and for its associated eigenvector 
vit [3|- The angle 6 between vu and Vi reaches a stationary distribution given by: 



P{9)=J\f 



1 -|-cos26'(l - 1^) 



1 - cos 20(1 -Ai) 



l/4e 



(74) 



where Ab is the average value of the bulk eigenvalues of C, assumed to be <C Ai. As expected, this distribution is 
invariant when — > tt — 0, since —Vi is also a top eigenvector. In the limit A^ ^ Ai, one sees that the distribution 
becomes peaked around 9 = and tt. For small 9, the distribution is Gaussian, with (cos^ 0) w 1 — eAf,/2Ai. The 
angle 9 is less and less fluctuating as e ^ (as expected) but also as Ab/Ai ^ 0: a large separation of eigenvalues 
leads to a well determined top eigenvector. In this limit, the distribution of Ai also becomes Gaussian (as expected 
from general results [i^) and one finds, to leading order: 

(Ai) « Ai - eAb/2; ((JAi)^) « Aje. (75) 

In the limit of large averaging time and one large top eigenvalue (a situation approximately realized for financial 
markets), the deviation from the true top eigenvalue SXi and the deviation angle 9 are independent Gaussian variables. 
One can compute the variogram of the top eigenvalue as: 

([Ai,t+. - Xi,t?) - 2Ale (1 - exp(-er)) . (76) 

One can also work out the average overlap of the top eigenvector with itself as a function of time lag, leading to: 

{{vu - vu+r)^) = 2 - 2{cos{9t - 9t+r)) ~ 2e^(l - exp(-er)). (77) 

These results assume that C is time independent. Any significant deviation from the above laws would indicate a 
genuine evolution of the market structure. We will come back to this point in section fV CI 
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V. APPLICATIONS: CLEANING CORRELATION MATRICES 
A. Empirical eigenvalue distributions 

Having now all the necessary theoretical tools in hand, we turn to the analysis of empirical correlation matrices of 
stock returns. Many such studies, comparing the empirical spectrum with RMT predictions, have been published in 
the literature. Here, we perform this analysis once more, on an extended data set, with the objective of comparing 
precisely different cleaning schemes for risk control purposes (see next subsection. IV B[) . 

We study the set of U.S. stocks between July, 1993 and April, 2008 (3700 trading days). We consider 26 samples 
obtained by sequentially sliding a window of T = 1000 days by 100 days. For each period, we look at the empirical 
correlation matrix of the N = 500 most liquid stocks during that period. The quality factor is therefore Q = T/N = 2. 
The eigenvalue spectrum shown in Fig. 2 is an average over the 26 sample eigenvalue distributions, where we have 
removed the market mode and rescaled the eigenvalues such that J dXpsi^) ~ 1 fo'" each sample. The largest 
eigenvalue contributes on average to 21% of the total trace. 

We compare in Fig. 2 the empirical spectrum with the Marcenko-Pastur prediction for Q = l/g = 2. It is clear that 
several eigenvalues leak out of the Marcenko-Pastur band, even after taking into account the Tracy- Widom tail, which 
have a width given by ^X^^ /N'^^^ sa 0.02, very small in the present case. The eigenvectors corresponding to these 
eigenvalues show significant structure, that correspond to identifiable economic sectors. Even after accounting for 
these large eigenvalues, the Marcenko-Pastur prediction is not very good, suggesting that the prior for the underlying 
correlation matrix C may carry more structure than just a handful of eigenvalue "spikes" on top of the identity 
matrix [itI [Toj. An alternative simple prior for the spectrum of C is a power-law distribution, corresponding to the 
coexistence of large sectors and smaller sectors of activity: 

P^^^^ = (A^XF^®^^ " ^^^^ 

with A and Aq related to Amin by the normalization of pc and by TrC — N (the latter requiring /x > 1). Using 
Eq. one can readily compute the dressed spectrum PeW- In Fig- 2, we show, on top of the empirical and 

Marcenko-Pastur spectrum, the "bare" and the dressed power-law spectrum for /i = 2. For later convenience, we 
parameterize the distribution using a = Amin G [0, 1], in which case A = {1 — a)^ and Aq = 2a — 1 (note that a = 1 
corresponds to the Marcenko-Pastur case since in this limit pc(A) = S{X — 1)). The fit shown in Fig. 2 corresponds 
to a = 0.35, and is now very good, suggesting indeed that the correlation of stocks has a hierarchical structure with 
a power-law distribution for the size of sectors (on this point, see also (2^). We should point out that a fit using 
a multivariate Student model also works very well for the Pearson estimator of the empirical correlation matrix. 
However, as noted in 9], such an agreement appears to be accidental. If the Student model was indeed appropriate, 
the spectrum of the most likely correlation matrix (see Eq. (fTS)) ') should be given by Marcenko-Pastur, whereas the 
data does not conform to this prediction [9;]. This clearly shows that the Student copula is in fact not adequate to 
model multivariate correlations. 

A complementary piece of information is provided by the statistics of the eigenvectors. Structure-less eigenvectors 
(i.e. a normalized random vector in N dimensions) have components that follow a Gaussian distribution of variance 
1/A^. The kurtosis of the components for a given eigenvector gives some indication of its "non-random" character (and 
is trivially related to the well known inverse participation ratio or Herfindahl index). We show in the inset of Fig. 2 
the excess kurtosis as a function of the rank of the eigenvectors (small ranks corresponding to large eigenvectors). We 
clearly see that both the largest and the smallest eigenvectors are not random, while the eigenvectors at the middle 
of the band have a very small excess kurtosis. As mentioned above, large eigenvalues correspond to economic sectors, 
while small eigenvalues correspond to long-short portfolios that invest on fluctuations with particularly low volatility, 
for example the difference between two very strongly correlated stocks within the same sector. 

B. RMT inspired cleaning recipes 

As emphasized in Sect. Ill C[ it is a bad idea at all to use directly the empirical correlation matrix in a Markowitz 
optimization program. We have seen that the out-of-sample risk is at best underestimated by a factor (1 — q), but the 
situation might be worsened by tail effects and/or by the non-stationarity of the true correlations. Since we know that 
measurement noise, induced by the finite size effects, significantly distort the spectrum of the correlation matrix, one 
should at the very least try to account for these noise effects before using the correlation matrix in any optimization 
program. With the above RMT results in mind, several "cleaning schemes" can be devised. The simplest one, first 
suggested and tested in @, is to keep unaltered all the eigenvalues (and the corresponding eigenvectors) that exceed 
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FIG. 2: Main figure: empirical average eigenvalues spectrum of the correlation matrix (plain black line), compared to (a) 
the Marcenko-Pastur prediction (dashed line) and the dressed power-law spectrum model (thick line). We also show the bare 
power law distribution with /i = 2 and the optimal value of Amin (dashed-dotted line). Inset: Kurtosis of the components of 
the eigenvectors as a function of the eigenvalue rank. One clearly sees some structure emerging at both ends of the spectrum, 
whereas the centre of the band is compatible with rotationally invariant eigenvectors. 



the Marcenko-Pastur edge (1 + ^fqf' ., while replacing all eigenvalues below the edge, deemed as meaningless noise, but 
a common value A such that the trace of the cleaned matrix remains equal to N . We call this procedure eigenvalue 
clipping, and will consider a generalized version where the (1 — a)N largest eigenvalues are kept while the N a smallest 
ones are replaced by a common value A. 

A more sophisticated cleaning is inspired by the power-law distribution model described above. If the true distri- 
bution is given by Eq. ([TS]). then we expect the A:th eigenvalue Afc to be around the value: j70| 



Afc 



Ao+M- 



2a-l + (l 



(79) 



The "power-law" cleaning procedure is therefore to fix /i = 2 and let a vary to generate a list of synthetic eigenvalues 
using the above equation Eq. (|79p for A; > 1, while leaving the corresponding fcth eigenvector untouched. Since the 
market mode fc = 1 is well determined and appears not to be accounted for by the power-law tail, we leave it as is. 

We will compare these RMT procedures to two classical, so-called shrinkage algorithms that are discussed in the 
literature (for a review, see [s^l; see also [13, IH, |5§| for alternative proposals and tests). One is to "shrink" the 
empirical correlation matrix E towards the identity matrix: 



E >(l-Q:)E + al, 0<a<l 



(80) 



An interpretation of this procedure in terms of a minimal diversification of the optimal portfolio is given in [25|. A 
more elaborate one, due to Ledoit and Wolf, is to replace the identity matrix above by a matrix C with I's on the 
diagonal and p for all off-diagonal elements, where p is the average of the pairwise correlation coefficient over all pairs. 

This gives us four cleaning procedures, two shrinkage and two RMT schemes. We now need to devise one or several 
tests to compare their relative merits. The most natural test that comes to mind is to see how one can improve the 
out-of-sample risk of an optimized portfolio, following the discussion given in Sect. Ill CI However, we need to define 
a set of predictors we use for the vector of expected gains g. Since many strategies rely in some way or other on 
the realized returns, we implement the following investment strategy: each day, the empirical correlation matrix is 
constructed using the 1000 previous days, and the expected gains are taken to be proportional to the returns of the 

current day, i.e. gi — rl/ ^J^/r*^. The optimal portfolio with a certain gain target is then constructed using Eq. (j22p 

with a correlation matrix cleaned according to one of the above four recipes. The out-of-sample risk is measured as 
the realized variance of those portfolios over the next 99 days. More precisely, this reads: 



w* = 



:*^E;;ig*' 



(81) 



where E^ is the cleaned correlation matrix, which depends on a parameter a used in the cleaning algorithm (see for 
example Eq. (|5D)) above) . A nice property of this portfolio is that if the predictors are normalized by their dispersion 
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— Ledoit-Wolf Shrinkage 
■ ■ ■ ■ Power Law Substitution 
Eigenvalue Clipping 




FIG. 3: Comparison between diflerent correlation matrix cleaning schemes for Markowitz optimization. Top curves: out-of- 
sample squared risk TZ^^t as a function of the cleaning parameter a (see Eq. I80|l . a = corresponds to the raw empirical 
correlation matrix, and a = 1 to the identity matrix. The best cleaning correspond to the smallest out-of-sample risk. The 
'true' risk for this problem is 7?.true = 1- Bottom curves: in-sample risk of the optimal portfolio as a function of a. 



on day t, the true risk is TZ^-^c = 1- The out-of-sample risk is measured as: 



t+99 



7^: 



*2 = J_ \^ 

out 99 



t'=t+l 



E'^i t' 



(82) 



where cr* is the volatility of stock i measured over the last 1000 days (the same period used to measure E). The 
out-of-sample risk is then averaged over time, and plotted in Fig. 3 as a function of a for the four different recipes. In 
all cases but Ledoit-Wolf, a = 1 corresponds to the Eq — 1 (in the case of the power-law method, a = 1 corresponds 
to pc(A) = 5{\ — 1)). In this case, 1^^^^ « 25 which is very bad, since one does not even account for the market mode. 
When a = 0, Eq is the raw empirical matrix, except in the power-law method. We show in Fig 3 the in-sample risks 
as well. From the values found for a = (no cleaning), one finds that the ratio of out-of-sample to in-sample risk is 
« 2.51, significantly worse that the expected result 1/(1 — g) = 2. This may be due either to heavy tail effects or to 
non stationary effects (see next subsection). The result of Fig. 3 is that the best cleaning scheme (as measured from 
this particular test) is eigenvalue clipping, followed by the power-law method. Shrinkage appears to be less efficient 
than RMT-based cleaning; this conclusion is robust against changing the quality factor Q. However, other tests can 
be devised, that lead to slightly different conclusions. One simple variant of the above test is to take for the predictor 
g a random vector in N dimensions, uncorrelated with the realized returns. Another idea is to use to correlation 
matrix to define residues, i.e. how well the returns of a given stock are explained by the returns of all other stocks 
on the same day, excluding itself. The ratio of the out-of-sample to in-sample residual variance is another measure 
of the quality of the cleaning. These two alternative tests are in fact found to give very similar results. The best 
cleaning recipe now turns out to be the power-law method, while the eigenvalue clipping is the worst one. Intuitively, 
the difference with the previous test comes from the fact that random predictor g is (generically) orthogonal to the 
top eigenvectors of E, whereas a predictor based on the returns themselves has significant overlap with these top 
eigenvectors. Therefore, the correct description of the corresponding eigenvalues is more important in the latter case, 
whereas the correct treatment of strongly correlated pairs (corresponding to small eigenvectors) is important to keep 
the residual variance small. 

In summary, we have found that RMT-based cleaning recipes are competitive and outperform, albeit only slightly, 
more traditional shrinkage algorithms when applied to portfolio optimization or residue determination. However, 
depending on the objective and/or on the structure of the predictors, the naive eigenvalue clipping method proposed 
in might not be appropriate. In view of both the quality of the fit of the eigenvalue distribution (Fig. 2) and 
the robustness of the results to a change of the testing method, our analysis appears overall to favor the power-law 
cleaning method. However, one should keep in mind that the simple minded shrinking with a = 1/2 is quite robust 
and in fact difficult to beat, at least by the above RMT methods that do not attempt to mop up the eigenvectors. 



C. Dynamics of the top eigenvector — an empirical study 



Finally, we investigate possible non stationary effects in financial markets by studying the dynamics of the top 
eigenvalue and eigenvector. In order to even measure these quantities, one needs a certain averaging time scale, noted 
1/e in Sect. IIVDI above. If the true top eigenvector (or eigenvalue) did not evolve in time, the variograms defined in 
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FIG. 4: Variogram of the top eigenvector, defined by Eq. (|77|l (main plot) and of the corresponding eigenvalue, Eq. (|77p 
(inset). The long term evolution is fitted by an exponential relaxation with a characteristic time around 100 days. Since the 
time periods are non overlapping, the value for r = 25 days should correspond to the asymptotic values in Eqs. (|76I77|I . but 
the latter are much smaller than the empirical values found here. These features show that the structure of the correlation 
matrix is itself time dependent. 

Eqs. (|76I77[) should converge to their asymptotic hmits after a time t ^ e~^. If the structure of the correlations does 
really change over long times, there should be a second relaxation mode for these quantities, leading to an increased 
asymptotic value for the variograms with, possibly, a slower relaxation mode contributing to a long time tail in the 
variogram. Empirical evidence for such a long term evolution of the market mode was presented in fi\. Here, we 
repeat the analysis on the above data set, with now a fixed pool of stocks containing 50 stocks for the whole period. 
The time scale 1/e is chosen to be 25 days. In Fig. 4 we show the variograms where one clearly see the existence 
of genuine fluctuations of the market mode on a time scale ^ 100 days, superimposed to the initial noise dominated 
regime that should be described by Eqs. (|76l77p . The asymptotic value of these variograms is furthermore much 
larger than predicted by these equations. In particular, the variogram of the largest eigenvector should converge to 
« 0.08. One should thus expect that the 'true' correlation matrix C is indeed time dependent with a relatively slow 
evolution on average (although correlation 'crashes' have been reported). This genuine non-stationarity of financial 
markets is, to some extent, expected. [Tlj It makes quantitative modelling difficult and sometimes even dangerous; 
even if a perfect cleaning scheme was available, the out-of-sample risk would always be underestimated. New ideas to 
understand, model and predict the correlation dynamics are clearly needed. 

VI. SOME OPEN PROBLEMS 

A. Cleaning the eigenvectors? 

As we reviewed in the previous section, RMT has already significantly contributed to improving the reconstruction 
of the correlation matrix from empirical data. However, progress is limited by the fact that most RMT results 
concern eigenvalues but say little about eigenvectors. It is not obvious to formulate natural priors for the structure 
of these eigenvectors - from symmetry alone, one can only argue that the top eigenvalue of the correlation matrix 
of an ensemble of stocks should be uniform, but even this is not obvious and there is a clear market capitalization 
dependence of the weights of the empirical top eigenvector. In order to make headway, one should postulate some a 
priori structure, for example factor models, or ultrametric tree models Whereas our knowledge of the influence of 
noise on the eigenvalue spectrum is quite satisfactory, the way 'true' eigenvectors are dressed by measurement noise is 
to a large extent unexplored (the case of a well separated top eigenvalue was treated in Sect. IIVDI above). Statistical 
techniques to "clean" eigenvectors with a non trivial structure are needed (for a very recent attempt, see [60|). As 
a matter of fact, results concerning the structure of eigenvectors are difficult as soon as one walks away from the 
assumption of statistical invariance under orthogonal transformations. For example, the structure of the eigenvectors 
of Levy matrices is extremely rich and numerically display interesting localization transitions [45] . However, analytical 
results are scarce. 
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B. Time and frequency dependent correlation matrices 

In order to guess correctly the structure of correlations in financial markets, it seems important to understand how 
these correlations appear from the high frequency end. It is clear that prices change because of trades and order 
flow. Correlations in price changes reflect correlations in order flow. Detailed empirical studies of these order flow 
correlations at the tick by tick level are not yet available, but important progress should be witnessed soon. On a 
more phenomenological level, one can gain intuition by postulating that the way stock i moves between t and t + dt, 
measured by the return r*, depends on the past returns of all other stocks j. If one posits a causal, linear regression 
model for the lagged cross-influences, one may write 0]: 

/+00 
dt' K^.j{t - t'y^ ■ with i^y (T<0) = (83) 
J -°° 

where represents the idiosyncratic evolution of stock i, due to the high frequency component of order flow. For 
At — > 0, on may assume for simplicity that these components are uncorrelated in time, i.e.: 

m'-)^Cl5,At-t')^ (84) 

where is the high frequency "bare" correlation matrix, that come from simultaneous trading of different stocks. 
The matrix describe how the past returns of stock j drive those of stock i. The can be thought of as "springs" 
that hold the price of different stocks together. 

Strongly correlated pairs of stocks are described by a strong cross-influence term Kij. Presumably some stocks 
are 'leaders' while other, smaller cap stocks, are laggers; this means that in general Kij{T) ^ Kji{T). Denoting the 
Fourier transform of the lag dependent correlation Cij{T) (deflned by Eq. (PTjl l as Cij{uj), one flnds: 

d.,{iv) = ^(1 - i^(c.))-,iC,V(l - Ki-u;))J^,. (85) 

kk' 

This model suggests that, arguably, the kernels Kij{T) captures more directly the microscopic mechanisms that 
construct the low frequency correlation matrix and is a fundamental object that one should aim at modelling, for 
example to infer meaningful influence networks. The correlation matrix reflects the way people trade, and possibly 
the correlation between their trading strategies on different stocks. Models that explicitly take into account this 
feedback mechanism between "bare" correlations and the impact of trading only start to be investigated [6l|, [g^, [11] , 
and appear to be particularly fruitful to understand the issue of non-stationarity, i.e. how the correlation matrix itself 
may evolve in time, for example in crisis periods. More work in that direction would certainly be interesting, because 
gradual or sudden changes of the correlation matrix is, as noted above, an important concern in risk management. 

C. Non-linear correlations and copulas 

We have mentioned above that a full characterization of the correlation structure amounts to specifying a "copula" . 
In particular, non-linear correlations, such as (r^r|)c, or tail correlations, can be very important for some applications 
(for example, the risk of an option portfolio). We have seen that the Student copula, which is intuitive and popular, 
in fact fails to describe stock data. The construction of an adequate model for the linear and non-linear correlations 
of stocks is still very much an open problem. 

D. Random SVD and Canonical Component Analysis 

Finally, let us mention two problems that may be worth considering, concerning the problem of random singular 
value decomposition. We have seen that in the case where there is no correlation whatsoever between N input and 
M output variables, the spectrum of empirical singular values with T observations is given by Eq. (|6ip . which is the 
analogue of the Marcenko-Pastur result. In practical cases, however, there might be some true correlations between 
input and output variables, described by a non-trivial 'true' spectrum of singular values. The modification of this 
spectrum due to measurement noise (i.e. the analogue of Eq. ([SSp *) is, to our knowledge, not known. A detailed 
analysis of the infiuence of heavy tails in the distribution of both input and output variables would also be quite 
satisfying. 
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